Human Action Recognition using Improved Vector of Locally Aggregated Descriptors
نویسندگان
چکیده
Recently, two high-dimensional encoding techniques for human action recognition, namely, Fisher vector (FV) and vector of locally aggregated descriptors (VLAD), are widely employed. In this study, a new human action recognition approach using improved VLAD with localized soft assignment (LSA) and second-order statistics is proposed. When encoding videos into VLAD, instead of considering only the nearest one, we utilize localized soft assignment, i.e., considering multiple nearest visual words. In general, LSA-VLAD captures only the first-order statistics of descriptors and visual words. In this study, LSA and second-order statistics are encoded into VLADlike form, namely, LSA2-VLAD. Based on the experimental results obtained in this study, in terms of average accuracy, the performance of the proposed approach combining LSA-VLAD and LSA2-VLAD is better than those of 10 comparison approaches.
منابع مشابه
Hybrid Super Vector with Improved Dense Trajectories for Action Recognition
With recent improved dense trajectory features (HOG, warped HOF, and warped MBH), we employ two advanced super vector methods, namely Fisher Vector (FV) and soft Vector of Locally Aggregated Descriptors (VLAD-K) to encode them separately. The two individual super vectors are concatenated into a Hybrid Super Vector, and a linear SVM classifier is used to predict labels. We achieve 87.46%1 in ave...
متن کاملAction recognition via spatio-temporal local features: A comprehensive study
Local methods based on spatio-temporal interest points (STIPs) have shown their effectiveness for human action recognition. The bag-of-words (BoW) model has been widely used and dominated in this field. Recently, a large number of techniques based on local features including improved variants of the BoW model, sparse coding (SC), Fisher kernels (FK), vector of locally aggregated descriptors (VL...
متن کاملBoosting VLAD with Supervised Dictionary Learning and High-Order Statistics
Recent studies show that aggregating local descriptors into super vector yields effective representation for retrieval and classification tasks. A popular method along this line is vector of locally aggregated descriptors (VLAD), which aggregates the residuals between descriptors and visual words. However, original VLAD ignores high-order statistics of local descriptors and its dictionary may n...
متن کاملStudy of Human Action Recognition Based on Improved Spatio-temporal Features
Most of the existed action recognition methods mainly utilize spatio-temporal descriptors of single interest point ignoring their potential integral information, such as spatial distribution information. By combining local spatio-temporal feature and global positional distribution information (PDI) of interest points,a novel motion descriptor is proposed in this paper. The proposed method detec...
متن کاملSpatio-Temporal VLAD Encoding for Human Action Recognition in Videos
Encoding is one of the key factors for building an effective video representation. In the recent works, super vector-based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial informatio...
متن کامل